AITopics | music score

Collaborating Authors

music score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LAPS-Diff: A Diffusion-Based Framework for Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning

Dhar, Sandipan, Gupta, Mayank, Rao, Preeti

arXiv.org Artificial IntelligenceDec-1-2025

The field of Singing Voice Synthesis (SVS) has seen significant advancements in recent years due to the rapid progress of diffusion-based approaches. However, capturing vocal style, genre-specific pitch inflections, and language-dependent characteristics remains challenging, particularly in low-resource scenarios. To address this, we propose LAPS-Diff, a diffusion model integrated with language-aware embeddings and a vocal-style guided learning mechanism, specifically designed for Bollywood Hindi singing style. We curate a Hindi SVS dataset and leverage pre-trained language models to extract word and phone-level embeddings for an enriched lyrics representation. Additionally, we incorporated a style encoder and a pitch extraction model to compute style and pitch losses, capturing features essential to the naturalness and expressiveness of the synthesized singing, particularly in terms of vocal style and pitch variations. Furthermore, we utilize MERT and IndicWav2Vec models to extract musical and contextual embeddings, serving as conditional priors to refine the acoustic feature generation process further. Based on objective and subjective evaluations, we demonstrate that LAPS-Diff significantly improves the quality of the generated samples compared to the considered state-of-the-art (SOTA) model for our constrained dataset that is typical of the low resource scenario.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2507.04966

Genre: Research Report > New Finding (0.93)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis

Sui, Kehan, Xiang, Jinxu, Jin, Fang

arXiv.org Artificial IntelligenceOct-28-2024

Singing voice synthesis (SVS) aims to produce high-fidelity singing audio from music scores, requiring a detailed understanding of notes, pitch, and duration, unlike text-to-speech tasks. Although diffusion models have shown exceptional performance in various generative tasks like image and video creation, their application in SVS is hindered by time complexity and the challenge of capturing acoustic features, particularly during pitch transitions. Some networks learn from the prior distribution and use the compressed latent state as a better start in the diffusion model, but the denoising step doesn't consistently improve quality over the entire duration. We introduce RDSinger, a reference-based denoising diffusion network that generates high-quality audio for SVS tasks. Our approach is inspired by Animate Anyone, a diffusion image network that maintains intricate appearance features from reference images. RDSinger utilizes FastSpeech2 mel-spectrogram as a reference to mitigate denoising step artifacts. Additionally, existing models could be influenced by misleading information on the compressed latent state during pitch transitions. We address this issue by applying Gaussian blur on partial reference mel-spectrogram and adjusting loss weights in these regions. Extensive ablation studies demonstrate the efficiency of our method. Evaluations on OpenCpop, a Chinese singing dataset, show that RDSinger outperforms current state-of-the-art SVS methods in performance.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.21641

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry:

Media > Music (0.35)
Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

Hong, Zhiqing, Huang, Rongjie, Cheng, Xize, Wang, Yongqi, Li, Ruiqi, You, Fuming, Zhao, Zhou, Zhang, Zhimeng

arXiv.org Artificial IntelligenceMay-20-2024

A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consists of singing voice synthesis (SVS) and vocal-to-accompaniment (V2A) synthesis. Melodist leverages tri-tower contrastive pretraining to learn more effective text representation for controllable V2A synthesis. A Chinese song dataset mined from a music website is built up to alleviate data scarcity for our research. The evaluation results on our dataset demonstrate that Melodist can synthesize songs with comparable quality and style consistency. Audio samples can be found in https://text2songMelodist.github.io/Sample/.

accompaniment, melodist, synthesis, (13 more...)

arXiv.org Artificial Intelligence

2404.09313

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Yuan, Ruibin, Lin, Hanfeng, Wang, Yi, Tian, Zeyue, Wu, Shangda, Shen, Tianhao, Zhang, Ge, Wu, Yuhang, Liu, Cong, Zhou, Ziya, Ma, Ziyang, Xue, Liumeng, Wang, Ziyu, Liu, Qin, Zheng, Tianyu, Li, Yizhi, Ma, Yinghao, Liang, Yiming, Chi, Xiaowei, Liu, Ruibo, Wang, Zili, Li, Pengfei, Wu, Jingcheng, Lin, Chenghua, Liu, Qifeng, Jiang, Tao, Huang, Wenhao, Chen, Wenhu, Benetos, Emmanouil, Fu, Jie, Xia, Gus, Dannenberg, Roger, Xue, Wei, Kang, Shiyin, Guo, Yike

arXiv.org Artificial IntelligenceFeb-25-2024

While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.

abc notation, music, preprint arxiv, (15 more...)

arXiv.org Artificial Intelligence

2402.16153

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WikiMT++ Dataset Card

Zhou, Monan, Wu, Shangda, Wang, Yuan, Li, Wei

arXiv.org Artificial IntelligenceSep-23-2023

Table 1 shows the specific names and number of classes of genre and emotion labels. WikiMT++ is an expanded and refined version of WikiMusicText (WikiMT), featuring 1010 curated lead 2.1 Attributes from WikiMT or Information sheets in ABC notation. To expand application scenarios of The titles, artists, genres, and descriptions are directly inherited WikiMT, we add both objective (album, lyrics, video) and from WikiMT. However, as they were originally subjective emotion (12 emotion adjectives) and emo_4q curated from openly accessible sources, potential constraints (Russell 4Q) attributes, enhancing its usability for music and wrongs still exist. For better precision and information retrieval, conditional music generation, automatic completeness, we update these attributes through CLaMP composition, and emotion classification, etc.

dataset, information, lyric, (16 more...)

arXiv.org Artificial Intelligence

2309.13259

Country:

Europe > Italy > Lombardy > Milan (0.05)
Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Beijing > Beijing (0.05)

Genre: Research Report (0.40)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.38)

Add feedback

TrOMR:Transformer-Based Polyphonic Optical Music Recognition

Li, Yixuan, Liu, Huaping, Jin, Qiang, Cai, Miaomiao, Li, Peng

arXiv.org Artificial IntelligenceAug-18-2023

Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores. Extensive experiments demonstrate that TrOMR outperforms current OMR methods, especially in real-world scenarios. We also develop a TrOMR system and build a camera scene dataset for full-page music scores in real-world. The code and datasets will be made available for reproducibility.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.0937

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

He, Jinzheng, Liu, Jinglin, Ye, Zhenhui, Huang, Rongjie, Cui, Chenye, Liu, Huadai, Zhao, Zhou

arXiv.org Artificial IntelligenceMay-17-2023

We are interested in a challenging task, Realistic-Music-Score based Singing Voice Synthesis (RMS-SVS). RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types (grace, slur, rest, etc.). Though significant progress has been achieved, recent singing voice synthesis (SVS) methods are limited to fine-grained music scores, which require a complicated data collection pipeline with time-consuming manual annotation to align music notes with phonemes. Furthermore, these manual annotation destroys the regularity of note durations in music scores, making fine-grained music scores inconvenient for composing. To tackle these challenges, we propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input, eliminating most of the tedious manual annotation and avoiding the aforementioned inconvenience. Note that music scores are based on words rather than phonemes, in RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment. Furthermore, we propose the first diffusion-based pitch modeling method, which ameliorates the naturalness of existing pitch-modeling methods. To achieve these, we collect a new dataset containing realistic music scores and singing voices according to these realistic music scores from professional singers. Extensive experiments on the dataset demonstrate the effectiveness of our methods. Audio samples are available at https://rmssinger.github.io/.

machine learning, music score, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.10686

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

The Music Note Ontology

Poltronieri, Andrea, Gangemi, Aldo

arXiv.org Artificial IntelligenceMar-30-2023

In this paper we propose the Music Note Ontology, an ontology for modelling music notes and their realisation. The ontology addresses the relation between a note represented in a symbolic representation system, and its realisation, i.e. a musical performance. This work therefore aims to solve the modelling and representation issues that arise when analysing the relationships between abstract symbolic features and the corresponding physical features of an audio signal. The ontology is composed of three different Ontology Design Patterns (ODP), which model the structure of the score (Score Part Pattern), the note in the symbolic notation (Music Note Pattern) and its realisation (Musical Object Pattern).

artificial intelligence, information, ontology, (15 more...)

arXiv.org Artificial Intelligence

2304.00986

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.05)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
(3 more...)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Proceedings of the 1st International Workshop on Reading Music Systems

Calvo-Zaragoza, Jorge, Hajič, Jan jr., Pacha, Alexander

arXiv.org Artificial IntelligenceDec-1-2022

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 1st International Workshop on Reading Music Systems, held in Paris on the 20th of September 2018.

machine learning, pattern recognition, recognition, (17 more...)

arXiv.org Artificial Intelligence

2301.10062

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Austria > Vienna (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
(23 more...)

Genre:

Research Report (0.64)
Instructional Material (0.45)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
(8 more...)

Add feedback

Proceedings of the 3rd International Workshop on Reading Music Systems

Calvo-Zaragoza, Jorge, Pacha, Alexander

arXiv.org Artificial IntelligenceDec-1-2022

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.

machine learning, pattern recognition, recognition, (19 more...)

arXiv.org Artificial Intelligence

2212.00378

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
Europe > Austria > Vienna (0.14)
Europe > Spain > Valencian Community > Alicante Province > Alicante (0.04)
(22 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Overview (0.92)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction (1.00)
(9 more...)

Add feedback